At its most fundamental level, remote sensing is a form of basic data collection of earth surface. Due to its ability to provide data at multiple spatial and temporal scales, remote sensing has great potential for environmental monitoring, ecological research and conservation science. You can find interesting examples in Remote Sensing in Ecology and Conservation, a fully open access journal from Wiley.
The basic product of satellites is image and like any other raster file, image is a 2D matrix with values representing radiance and each cell representing an area on the earth having dimensions equivalent to the spatial resolution of the satellite (e.g. 30m x 30m for Landsat multi-specral bands).
Properties of satellite imagery:
Based on these properties, satellites can be divided in different categories like high-resolution satellite (with 2.5m or lower spatial resolution) and hyper-spectral (with more than 30 spectral bands). A brief tutorial on Fundamentals of Remote Sensing provided by The Canada Centre for Mapping and Earth Observation could be a good source to gain some basic background about remote sensing (e.g. different types of satellites, the characteristics of electromagnetic spectrum and the impact of atmosphere on radiance). For more details on processing of remotely sensed imagery see Mather, P.M. and M. Koch (2011).
You can access satellite data from several sources, including:
Spectral signature is the variation of reflectance or emittance of a material with respect to wavelengths (i.e.,reflectance/emittance as a function of wavelength).
Spectral signature plot (from www.intechopen.com)
As it can be seen, each feature has a distinct pattern and they has different behaviour through electromagnetic spectrom. This provides the basis to distingush different objects in an image.
The Landsat program is the longest-running enterprise for acquisition of satellite imagery of Earth. It is a joint effort of the U.S. Geological Survey (USGS) and the National Aeronautics and Space Administration (NASA). They have successfully lunched 7 sattelites since July 1972 and the most recent one, Landsat 8, was launched on February 11, 2013. Having a fine resolution and long time span, Landsat imagery has considerable potential for environmental monitoring and change detection.
Some characteristics of Landsat 7 & 8 satellites and imagery:
Landsat 7 & 8 spectral cover:
ETM+ versus OLI-TIRS (from www.landsat.gsfc.nasa.gov)
All basic raster operations can be done on satellite imagery, although one should notice that if any of these operations change the original pixel values, it may affect the result of some image processing. For example, resampling to another cell size, create image mosaic or changing projection system might change the pixel values when using resampling method rather than Nearest Neighbour. Therefore, it is recommended to do the processing that are highly depends on spectral values like vegetation analysis before applying any of abovementioned techniques.
Some useful packages for analyzing satellite imagery and products:
RStoolbox
raster
landsat
rLandsat8
You can import satellite imagery as a raster file by raster(), stack() or brick() functions in raster package. Since these imagery containg several bands, it’s easier to load all bands at once.
# loading libraries
library(RStoolbox)
library(raster)
library(ggplot2)
# Import several raster files as a RasterStack
setwd("/Users/rvalavi/Desktop/RScodingclub/landsat8")
bandList <- list.files(getwd(), pattern=".tif$", full.names=FALSE) # R is case sensitive!
img <- stack(bandList)
To show image properties (resolution, number of bands etc.), just print the image object.
img
## class : RasterStack
## dimensions : 2834, 2814, 7974876, 8 (nrow, ncol, ncell, nlayers)
## resolution : 30, 30 (x, y)
## extent : 258645, 343065, -4230645, -4145625 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=55 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
## names : LC08_B1, LC08_B2, LC08_B3, LC08_B4, LC08_B5, LC08_B6, LC08_B7, LC08_B9
## min values : 8239, -23286, 6404, -30686, -30704, -32625, -32715, 4951
## max values : 25115, 25817, 30771, 28302, 31537, 32528, 32721, 5193
Functions like nlayers(), ncell(), dim(), extent() and res() can be used to access different properties of a raster file.
There is an alternative way to read Landsat imagery using their metadata or header file (MTL.txt) by RStoolbox package. The advantage of this function is that all other information avaiable in Landsat metadata file will be added for further use.
# import separate Landsat files into single RasterStack
mtlFile <- "/Users/rvalavi/Desktop/Images/LC08_2017_092086/LC08_L1TP_092086_20170122_20170311_01_T1_MTL.txt"
myImg <- stackMeta(mtlFile, quantity = "all", category = "image", allResolutions = FALSE)
plot(myImg)
Notice, there is no band 8 in this RasterStack. That is because band 8 in landsat imagery is the panchromatic band which has a different spatial resolution compared to multi-spectral bands (15m rather than 30m). The dn in name of each band indicate that the valuse are Digital Number, the raw values of satellite imagery. RStoolbox can identify this and since it has some functions to change the DN to other forms like radiance and refletance, it is added to each band name.
Satellite imagery can be plotted individually and are normally shown in gray-scale colour. However, we can use three bands to visualise them as a colour image. A colour image consists of 3 bands red, green and blue. This can be done using plotRGB() in raster package.
# plot in grayscale and colour image
par(mfrow=c(1,2))
plotRGB(img, r = 4, g = 3, b = 2, axes=TRUE, stretch="lin", main = "True Colour Composite")
plot(img[[4]], col=gray.colors(256, start=0, end=1), main="Single band grayscale")
This raster contains values between 0 and 65536. These values represent degrees of brightness associated with the image band which is based on its radiometric resolution and is calculated by \(2^n\), where n is the radiometric resolution. For example for Landsat 8 with 16-bit radiometric resolution is \(2^{16} = 65536\) (0 and 256 in 8-bit radiometric resolution e.g. Landsat7). The bigger the range is, the better objects can be distinguished in an image. In the case of a RGB image (red, green and blue), band 1 is the red band. When we plot the red band, larger numbers (towards 65536) represent pixels with more red in them (a strong red reflection). Smaller numbers (towards 0) represent pixels with less red in them (less red was reflected for that pixel). To plot an RGB image, we mix red + green + blue values into one single colour to create a full colour image - similar to the colour image a digital camera creates. Different combinations of band can be used to enhance different objects in the colour image (see spectral signature plot).
RStoolbox package has an option to plot colour image using ggplot, then you can easily add other featuers to it.
# false colour composite with ggplot
# I changed x and y label here
ggRGB(img, r=5, g=4, b=3, stretch="lin") + xlab("Longitude") + ylab("Latitude")
To add other features, ggplot2 package should be loaded in global environment using library(ggplot2) function.
# my Shiny app for interactive RGB plotting
# RGBplot(img)
Vegetation Indices (VI) are spectral transformations of two or more bands designed to enhance the contribution of vegetation properties and allow identifying vegetation and provide a measure on their health and vitality. One of the most commonly used VI is Normalized Difference Vegetation Index (NDVI) which is defined as below:
\[NDVI = \frac{(NIR - Red)}{(NIR + Red)}\] Where NIR is Near-Inferared band. Regardless of the values use in this eqation, NDVI will always be calculated between -1 and 1.
While NDVI is primerily used for studying vegetation, it can also be useful in identifying other features in image as well. For example, water has negative values in NDVI. In regions with sparse vegetation cover (high background soil) or very dense cover, NDVI tends to have a low performance and it is recommended to used other VIs that accounts for these issues (e.g. Enhanced Vegetation Index or EVI). However, because of its ease of use and relationship to many ecosystem parameters, NDVI has seen widespread use in environmental studies.
We can create a function for calculating NDVI.
# NDVI function
NDVI <- function(img, NIR, Red){
ndvi <- (img[[NIR]] - img[[Red]]) / (img[[NIR]] + img[[Red]])
return(ndvi)
}
Now, we can use this function to compute NDVI for our image.
# in Landsat8 imagery, band 5 is NIR and band 4 is red
melbourne_ndvi <- NDVI(img, 5, 4)
# plot the result
par(mfrow=c(1,2))
plotRGB(img, r = 5, g = 4, b = 3, axes=TRUE, stretch="lin", main = "False Colour Composite")
plot(melbourne_ndvi, main="NDVI of Melbourne")
As you can see, vegetation cover (red in false colour composite) has a higher NDVI value (green areas).
NDVI can also be used for change detection using imagery of the same region in diferent times, climate change studies, biomass production, Carbon sequestration and as an input feature for vegetation and land cover classification. There are some other indeices available for other features such as burn areas, snow, water etc.
MODIS vegetation indices products (NDVI and EVI), produced on 16-day intervals (mean NDVI) and at multiple spatial resolutions (250m, 500m and 1km), provide consistent spatial and temporal comparisons of vegetation canopy greenness, a composite property of leaf area, chlorophyll and canopy structure.
Some of these indices need extra information other than spectral bands which are mentioned in the package maunal as default values.
To get full set of VIs suggested by RStoolbox you should convert DN values to reflectance by applying Radiometric Calibration. This is done by the information drived from metadata.
# Conversion parameters from DN to reflectance (Landsat8 only)
meta <- readMeta(mtlFile) # read metadata for our previous image
getMeta(myImg[[1:8]], metaData = meta, what = "CALREF") %>% pander
| Â | offset | gain |
|---|---|---|
| B1_dn | -0.1 | 2e-05 |
| B2_dn | -0.1 | 2e-05 |
| B3_dn | -0.1 | 2e-05 |
| B4_dn | -0.1 | 2e-05 |
| B5_dn | -0.1 | 2e-05 |
| B6_dn | -0.1 | 2e-05 |
| B7_dn | -0.1 | 2e-05 |
| B9_dn | -0.1 | 2e-05 |
# Radiometric Calibration and Correction
OLIcor <- radCor(myImg, meta, method = "apref", bandSet = "full",
atmosphere="clear", darkProp = 0.01, clamp = TRUE, verbose)
Notice, this processing change the band names.
# Calculate all possible indices, given the provided bands
# Convert DNs to reflectance to calculate EVI and EVI2
VIs <- spectralIndices(OLIcor, blue='B2_tre', green='B3_tre', red='B4_tre', nir='B5_tre')
plot(VIs)
I highly recommend to look at these function’s help page by help(spectralIndices) or ?spectralIndices to see more detail on their parameters and requierments.
We can use threshold to extract specific objects in an image. This can be done by set a threshold on a raw band, although it might be more effitient to apply the threshold on a drived feature like vegetation indices.
# apply 0.2 threshold on NDVI for extracting vegetation cover
veg <- calc(melbourne_ndvi, function(x){x[x < 0.2] <- NA; return(x)})
# plot the result to compare with false colour composite
par(mfrow=c(1,2))
plotRGB(img, r = 5, g = 4, b = 3, axes=FALSE, stretch="lin")
plotRGB(img, r = 3, g = 3, b = 3, axes=FALSE, stretch="lin") # gray-scale with just 1 band
plot(veg, col="darkgreen", legend=FALSE, add=TRUE) # add on top of the second plot
In the next example, I want to extract burn area based on NDVI change detection of two imagery, before and after the fire. The study area is located in Passargad region near Shiraz city in Iran obviously ;)
The process is as follow:
# change working directory to imagery location
setwd("/Users/rvalavi/Desktop/Pasargad")
# reading Landat8 imagery - before the fire
mtlFileImage1 <- "LC81620392016158LGN00/LC81620392016158LGN00_MTL.txt"
image1 <- stackMeta(mtlFileImage1, quantity="all", category="image", allResolutions=FALSE)
# reading Landat8 imagery - after the fire
mtlFileImage2 <- "LC81620392016174LGN00/LC81620392016174LGN00_MTL.txt"
image2 <- stackMeta(mtlFileImage2, quantity="all", category="image", allResolutions=FALSE)
# check acquisition dates from metadata file
metaImage1 <- readMeta(mtlFileImage1)
metaImage2 <- readMeta(mtlFileImage2)
print(metaImage1$ACQUISITION_DATE) # before the fire
print(metaImage2$ACQUISITION_DATE) # after the fire
## [1] "2016-06-06 06:57:01 GMT"
## [1] "2016-06-22 06:57:05 GMT"
Now it is better to crop the data around the burn area, since landsat imagery cover a vast area and it can be computationally extensive to run all procrssing on the entire scene.
# crop imagery to reduce computation time
ext <- c(687177.6, 697412.5, 3340035, 3348913)
img1 <- crop(image1, ext)
img2 <- crop(image2, ext)
# plot before and after imagery
par(mfrow=c(1,2))
plotRGB(img1, r=5, g=4, b=3, stretch="lin", main="before")
plotRGB(img2, r=5, g=4, b=3, stretch="lin", main="after")
This can also be done interactively using two of the three interactive functions in raster package by which you can use your mouse cursor to draw line, polygon or extent on a plotted map (should be a basic plot not ggplot etc.). These are drawLine(), drawPoly() and drawExtent() respectively.
Now we should subtract two NDVI layers.
# NDVI of mage 1
ndvi1 <- NDVI(img1, 5, 4)
# NDVI of image 2
ndvi2 <- NDVI(img2, 5, 4)
# subtract NDVIs
burn <- ndvi2 - ndvi1
plot(burn)
The threshold can be find by repeated check or drawPoly() can be used to extract small part of the burn area and get its summary statistics for selecting the cut off value.
# extracting the obect
burnArea <- calc(burn, function(x){x[x > -0.02004] <- NA; return(x)})
# plot and check the result
par(mfrow=c(1,2))
plotRGB(img2, r = 5, g = 4, b = 3, axes=FALSE, stretch="lin")
plotRGB(img2, r = 5, g = 4, b = 3, axes=FALSE, stretch="lin")
plot(burnArea, col="darkgreen", legend=FALSE, add=TRUE)
Unsupervised classification is where the outcomes (groups of pixels with common characteristics) are selected based on a clustring algorithm such as k-means. The user can just specify which algoritm the software will use and the desired number of output classes. Yet, the user must have knowledge of the area being classified when the groups of pixels with common characteristics classified by the computer, have to be related to actual objects on the ground (such as wetlands, built-up areas, forests, farm lands, etc.).
# Unsupervised classification
USC <- unsuperClass(img, nClasses = 6, nSamples = 10000, nStarts = 25, nIter = 100,
norm = FALSE, clusterMap = TRUE, algorithm = "Hartigan-Wong")
# the output object has a map as a subset
plot(USC$map)
Supervised classification is based on the idea that a user can select sample pixels in an image that are representative of specific classes and then use a statistical or machine learning algorithm by which these training sites will be used as references for the classification of all other pixels in the image. Some people use hybrid method that is based on a combination of supervised and unsupervised classification processes to develop final output analysis and classified maps.
# change working directory to image location
setwd("/Users/rvalavi/Desktop/Shiraz")
# import 7 bands in a single source file
img <- brick('Im.tif') # you can save more than one raster layer in one sigle TIF file
# change the bands name to their real name
names(img) <- c('B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7')
# import training samples as shapefile
training <- shapefile("trainingset.shp")
# plot points on map
ggRGB(img, r=5, g=4, b=3, stretch="lin") + xlab("Longitude") + ylab("Latitude") +
geom_point(aes(x = X, y = Y, color=Class), data = training@data,
alpha = 0.6, size = 3)
This step is just for exploration and checking. RStoolbox automatically extract features (raster) values during the classification process.
# add NDVI as a feature for classification
img$ndvi <- NDVI(img, 5, 4) # use our previous function to compute ndvi
# extract band values
trainingRS <- extract(img, training, df=TRUE, na.rm=TRUE)
trainingRS$Class <- training$Class # add classes to band value table
head(trainingRS) %>% pander
| ID | B1 | B2 | B3 | B4 | B5 | B6 | B7 | ndvi | Class |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 9574 | 8801 | 8640 | 7905 | 21411 | 11963 | 8655 | 0.4607 | vegi |
| 2 | 9525 | 8753 | 8604 | 7671 | 20590 | 12071 | 8769 | 0.4571 | vegi |
| 3 | 9612 | 8793 | 8709 | 7737 | 21705 | 12730 | 8987 | 0.4744 | vegi |
| 4 | 9791 | 9063 | 9154 | 8515 | 22930 | 13270 | 9611 | 0.4584 | vegi |
| 5 | 9756 | 9060 | 9175 | 8508 | 21454 | 13018 | 9627 | 0.4321 | vegi |
| 6 | 9798 | 9092 | 9457 | 8737 | 24071 | 14003 | 10141 | 0.4674 | vegi |
Since DN values of objects on closer spectral band are more similar than distant band (I call it Spectral Autocorrelation), there will be a high correlation in closer spectral bands on the electromagnetic spectrum. We should check for multi-collinearity problem to avoid data redundancy.
# exploring and plotting the data
library(psych)
pairs.panels(trainingRS[2:9], hist.col="light blue", scale = TRUE)
Correlated features (bands or covariates) should be removed from classification. However, before removing any band, it is recommended to check separability of Classes in different band/feature pairs to select a better subset of features. This can be done by plotting each pair and colour the points based on their class.
# explor band 1 vs band 7
qplot(B1, B7, colour=Class, data=trainingRS)
As you can see our classes are almost completely separated in B1 and B7, so they can be classified easily.
Another approach could be Principal Component Analysis (PCA) to reduce number of features and create uncorrelated features with the highest variance in the data. RStoolbox has a function to apply PCA on raster data.
PCA <- rasterPCA(img, nComp=3, maskCheck=TRUE)
PCA$model # print the statistics of PCs
## Call:
## princomp(cor = spca, covmat = covMat[[1]])
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## 8.992547e+03 6.439133e+03 1.905915e+03 1.128823e+03 7.803053e+02
## Comp.6 Comp.7 Comp.8
## 2.848851e+02 1.202687e+02 1.215665e-02
##
## 8 variables and 1101186 observations.
This function use caret pacakge in background for classification. So you can use any classification algorithm in caret here. RStoolbox classification uses SpatialPolygonsDataFrame or SpatialPointsDataFrame for training data.
# Classification using a machine learning model(rf, RRF, svmRadial, nnet, mlp, C5.0, rpart)
sclass <- superClass(PCA$map, trainData = training, responseCol = "Class",
model = 'rf', tuneLength = 1, trainPartition = 0.8)
# using rasterVis library for representation of categorical map
library(rasterVis)
mapTheme <- rasterTheme(region=brewer.pal(7,"Paired")) # create a colour palet
levelplot(sclass$map, par.settings =mapTheme, main="Random Forest Classification")
Check Confusion Matrix and statistics of the validation set. You can access the training and validation dataset in the classification object. However, due to random splitting of the training and validation datasets, the accuracy might be optimistic.
# show the model validation
sclass$validation$performance
## Confusion Matrix and Statistics
##
## Reference
## Prediction builtup1 builtup2 saltmarsh soil vegi wetland
## builtup1 25 0 0 1 0 0
## builtup2 0 38 0 0 0 0
## saltmarsh 0 0 8 0 0 0
## soil 1 0 0 34 0 0
## vegi 0 0 0 0 26 0
## wetland 0 0 0 0 0 11
##
## Overall Statistics
##
## Accuracy : 0.9861
## 95% CI : (0.9507, 0.9983)
## No Information Rate : 0.2639
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9826
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: builtup1 Class: builtup2 Class: saltmarsh
## Sensitivity 0.9615 1.0000 1.00000
## Specificity 0.9915 1.0000 1.00000
## Pos Pred Value 0.9615 1.0000 1.00000
## Neg Pred Value 0.9915 1.0000 1.00000
## Prevalence 0.1806 0.2639 0.05556
## Detection Rate 0.1736 0.2639 0.05556
## Detection Prevalence 0.1806 0.2639 0.05556
## Balanced Accuracy 0.9765 1.0000 1.00000
## Class: soil Class: vegi Class: wetland
## Sensitivity 0.9714 1.0000 1.00000
## Specificity 0.9908 1.0000 1.00000
## Pos Pred Value 0.9714 1.0000 1.00000
## Neg Pred Value 0.9908 1.0000 1.00000
## Prevalence 0.2431 0.1806 0.07639
## Detection Rate 0.2361 0.1806 0.07639
## Detection Prevalence 0.2431 0.1806 0.07639
## Balanced Accuracy 0.9811 1.0000 1.00000